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In  a recent  paper,  Howard,  Balias,  and  Burgy  (1978) 
presented  a bottom-up  model  for  the  classification  of  complex, 
steady-state  sounds.  The  model  assumes  that  listeners  undergo 
several  steps  in  classifying  a presented  stimulus.  First,  an 
initial,  low-level  measurement  representation  of  the  stimulus  is 
constructed  by  the  peripheral  auditory  system.  Second,  these 
measurements  are  transformed  by  a feature  extraction  processor 
into  a vector  of  higher-order  feature  elements.  Third,  a 
decision  processor  then  estimates  the  perceptual  distance  between 
the  stimulus  and  each  of  a set  of  category  prototypes  to 
determine  the  likelihood  of  each  category  given  that  stimulus. 
Finally,  the  stimulus  is  classified  into  the  category  having  the 
greatest  likelihood. 

As  in  previous  models  of  aural  classification  (Durlach  & 
Braida,  1969),  it  is  assumed  that  category  likelihoods  are 
represented  by  Gaussian  density  functions  over  the  perceptual 
space.  These  likelihood  functions  may  be  estimated  by  fitting 
Gaussian  distributions  to  confusion  data  obtained  in  a 
classification  experiment.  In  the  Howard  et  al  model,  the 
estimated  variance  parameters  for  the  Gaussian  distributions  are 
particularly  important.  Unlike  earlier  classification  models, 
one  parameter  is  obtained  for  each  dimension  in  the  feature 
space.  These  parameters  are  analogous  to  uncertainty  measures  in 
that  they  describe  the  extent  to  which  the  categories  overlap  in 
the  perceptual  space. 

Previous  research  with  the  model  (Howard  et  al , 1978)  has 
shown  that  listeners  improve  their  classification  peformance  very 
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quickly  with  practice,  and  that  this  improvement  is  reflected  in 
a corresponding  reduction  in  the  estimated  uncertainty 
parameters.  The  reduction  was  described  as  a fine  tuning  process 
that  the  listeners  use  to  maximize  the  overall  probability  of 
correct  classification.  This  tuning  process  is  selective  in  that 
it  emphasizes  features  that  are  important  in  the  classification 
task,  i.e.,  greater  uncertainty  reduction  occurs  for  more 
relevant  dimensions  than  for  less  relevant  dimensions.  However, 
Howard  et  al  have  noted  that  the  tuning  process  does  not  continue 
as  long  as  one  would  expect  for  optimal  classification.  In 
particular,  there  appeared  to  be  an  overall  limit  on  the  amount 
of  uncertainty  reduction  that  could  occur.  Since  this  limit 
could  not  be  attributed  to  absolute  sensitivity  limitations, 
Howard  et  al  (1978)  argued  that  it  was  determined  by  a limit  on 
the  listener's  overall  information-processing  capacity  (e.g., 
Kahneman,  1973).  Given  this  limit,  it  was  concluded  that  the 
listener's  feature  tuning  process  apportioned  his  or  her 
attentional  effort  between  the  two  perceptual  features  so  as  to 
maximize  the  average  probability  correct. 

The  Howard  et  al  (1978)  findings  have  an  alternative 
explanation  which  must  be  explored.  It  is  possible  that 
insufficient  practice  was  provided  in  the  experiment  for 
listeners  to  fine-tune  their  sensitivity  optimally  on  both 
stimulus  dimensions.  Their  listeners  peformed  720 
classifications  over  three  days  which  was  not  sufficient  to 
conclude  that  asymptotic  levels  had  been  reached.  In  order  to 
study  the  effects  of  additional  practice,  a 
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experiment  was  conducted  that  provided  extensive  practice.  The 
stimuli  and  experimental  procedure  were  similar  to  those  used  in 
the  previous  study.  Listeners  were  asked  to  classify  sixteen 
amplitude  modulated  noise  patterns  into  one  of  eight  categories. 
The  stimuli,  which  resembled  a broad  class  of  passive  sonar 
signals,  varied  in  both  modulation  frequency  and  attack.  Howard 
et  al  (1978)  have  referred  to  the  perceptual  correlate  of 
modulation  frequency  as  sound  Tempo,  and  the  perceptual  correlate 
of  attack  as  sound  Quality.  Some  listeners  learned  a category 
partition  that  was  based  primarily  on  signal  Tempo  (the  Tempo 
group),  and  others  learned  a partition  that  was  based  primarily 
on  signal  Quality  (the  Quality  group).  The  aural  classification 
model  was  used  to  analyze  and  interpret  the  results. 

Method 

Participants . Four  students  were  recruited  by 
advertisement.  Two  students  were  assigned  to  the  Tempo  group  and 
two  to  the  Quality  group.  All  four  listeners,  three  females  and 
one  male,  reported  normal  hearing.  Their  pay  was  determined  by  a 
schedule  that  encouraged  high  performance. 

Apparatus . All  experimental  events  were  controlled  by  a 
laboratory  digital  computer.  Modulation  waveforms  were 
synthesized  by  the  computer  and  output  on  a 12-bit 
d igital-to-analog  converter  at  a 5 kHz  sampling  rate.  This 
signal  was  applied  to  the  input  of  a laboratory-constructed 
transconductance  operational  amplifier  circuit  (RCA  CA3084).  The 
output  gain  of  the  operational  amplifier  was  directly 
proportional  to  the  amplitude  of  the  modulation  signal.  These 
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amplitude-modulated  signals  were  delivered  to  listeners  over 
matched  Telephonies  TDH-49  headphones  with  MX-41/AR  cushions. 

Stimul i . A set  of  16  amplitude-modulated  noise  signals  was 
generated  by  combining  four  levels  of  modulation  frequency  (4, 
4.8,  5.6,  6.4  Hz)  and  four  levels  of  attack  (43%,  57%,  71%,  86% 
of  period).  These  sounds  differed  from  those  used  by  Howard  et 
al  both  in  modulation  frequency  and  attack.  The  modulation 
frequency  differences  were  made  smaller  in  order  to  increase  task 
demand  and  to  prevent  possible  ceiling  effects.  The  attack 
values  were  chosen  to  avoid  rise  times  of  20-40  msec.  According 
to  Cutting  and  Rosner  (1974),  perception  of  rise  times  in  this 
interval  involves  different  processes  than  perception  of  rise 
times  outside  this  interval. 

The  noise  carrier  was  20  Hz  - 20  kHz  white  noise  (B  & K type 
1402  Random  Noise  Generator).  The  modulated  signals  had  sawtooth 
waveforms  with  the  attack  values  indicated  above.  All  signals 
were  presented  at  about  65  dB  SPL.  As  in  the  Howard  et  al  (1978) 
study,  two  partitions  of  the  16  sounds  into  eight  categories  (two 
sounds  per  category)  were  constructed.  In  the  Tempo  partition, 
listeners  were  required  to  discriminate  four  levels  of  Tempo  and 
only  two  of  Quality  (43%  and  57%  vs  71%  and  86%).  The  second, 
Quality,  partition  required  four  levels  of  Quality  discrimination 
and  two  of  Tempo  discrimination  (4  and  4.8  Hz  vs  5.6  and  6.4  Hz). 

Procedure . The  listeners  were  tested  individually  in  a 
sound  attenuated  booth.  They  were  told  that  their  task  was  to 
learn  to  classify  16  sounds  into  eight  categories,  two  sounds  per 
category.  No  specific  instructions  were  given  about  how  Tempo 
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and  Quality  were  to  be  used.  Each  trial  began  with  a visual 
warning  followed  by  a 2.5  or  3 sec  presentation  of  one  of  the 
sounds.  The  two  signal  durations  occurred  equally  often  and  were 
included  to  discourage  a simple  "peak-counting”  strategy.  After 
the  signal  ended,  the  listener  pressed  one  of  eight  keys  (labeled 
with  CVC  nonsense  trigrams  of  equal  association  value)  to 
indicate  the  category  for  the  sound.  Feedback  was  provided  after 
each  trial. 

All  listeners  received  192  trials  in  each  of  16  blocks  for  a 
total  of  3072  trials.  This  represents  a substantial  increase 
over  the  720  trials  used  in  our  earlier  study.  Trials  were 
randomized  within  each  block.  Listeners  normally  completed  two 
consecutive  blocks  a day. 

Results  and  Pi scussion 

Overall  per formance . Generally,  all  four  listeners  achieved 
asymptotic  performance  within  eight  blocks  (1536  trials)  as  may 
be  seen  in  Figure  1. 


Insert  Figure  1 here 


It  is  evident  from  this  figure,  however,  that  the  four  listeners 
differed  widely  in  their  final  performance  levels.  For  example, 
listener  SG  achieved  a final  level  of  85%  correct,  whereas 
listener  PD  remained  at  approximately  25%  correct  throughout  the 
entire  experiment.  Although  in  all  cases  listeners  were 
classifying  well  above  the  12.5%  chance  level,  it  is  evident  that 
listeners  EK  and  PD  had  considerable  difficulty  in  using  both 
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dimensions.  In  particular,  PD ' s 25%  performance  level  suggests 
that  he  was  totally  insensitive  to  differences  in  signal  Quality. 
Individual  listener  performance  will  be  considered  in  more  detail 
below. 

The  data  also  reveal  higher  overall  performance  levels  for 
the  Tempo  partition  than  for  the  Quality  partition  (59%  and  36% 
correct  for  the  Tempo  and  Quality  groups,  respectively). 
Although  this  finding  is  consistent  with  the  results  reported  by 
Howard  et  al  (1978),  in  the  present  study  even  larger  within 
group  differences  were  observed. 

Theoretical  analysis . The  Gaussian  classification  model  was 
used  to  estimate  theoretical  confusion  matrices  for  each  listener 
on  each  block.  The  theoretical  matrices  were  determined  by 
selecting  standard  deviation  parameters  for  each  feature  that 
minimized  the  discrepancy  between  the  theoretical  and  observed 
matrices  in  a least  squares  sense.  A standard,  quasi-Newton 
gradient  algorithm  was  used  to  perform  the  fits  (subroutine  ZXMIN 
in  the  IMSL  statistical  library).  The  reciprocals  of  the 
standard  deviation  parameters  were  used  to  estimate  a subjective 
weight  for  each  feature,  and  the  two  weights  were  summed  to 
estimate  the  total  attentional  effort.  The  predicted  confusion 
matrices  were  correlated  with  the  observed  matrices  to  judge  the 
accuracy  of  the  estimation.  These  correl ations , averaged  across 
blocks  (using  Fisher's  z transformation),  were  .96,  .89,  .78  and 
.63  for  listeners  SG,  EK,  KR  and  PD,  respectively.  Thus,  the 
fits  ranged  from  very  good  for  listener  SG,  to  very  poor  for 
listener  PD. 
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The  estimated  parameters  for  each  listener  are  plotted  as  a 
function  of  block  in  Figures  2 to  5. 


Insert  Figures  2-5  here 


Since  the  overall  attentions]  effort  is  simply  the  sum  of  the 
Tempo  and  Quality  weights,  one  can  determine  which  feature  a 
listener  emphasized  by  comparing  the  Tempo  weight  to  the  total 
attentional  effort  parameter. 

A question  of  primary  interest  in  the  present  study  concerns 
the  long  term  stability  of  the  listeners'  overall  attentional 
effort.  In  particular,  does  this  parameter  continue  to  increase 
after  the  720  trials  examined  in  the  Howard  et  al  (1978) 
experiment?  To  what  extent  do  any  increases  beyond  this  point 
reflect  additional  tuning  of  the  less  important  feature?  An 
examination  of  Figures  2-5  reveals  that  with  the  exception  of 
listener  KR , relatively  little  improvement  occurs  after  the 
fourth  block  (i.e.,  after  768  trials).  However,  since 
interesting  individual  differences  exist,  the  data  of  each 
listener  will  be  characterized  in  more  detail. 

Listener  SG  had  the  best  overall  performance.  She  improved 
consistently  over  the  first  four  blocks  as  is  evident  in  Figure 
1.  It  is  important  to  note,  however,  that  her  improved 
performance  resulted  from  a fine-tuning  of  both  the  Tempo  and 
Quality  features.  This  is  seen  in  Figure  2 in  the  initially 
diverging  plots  of  the  attentional  effort  and  Tempo  weight 
parameters.  The  more  rapid  increase  in  attentional  effort 


Figure  2.  Obtained  and  optimal  weight  parameters  by  block  for  listener  SG. 
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Figure  3.  Obtained  and  optimal  weight  parameters  by  block  for  listene 
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Figure  5.  Obtained  and  optimal  weight  parameters  by  block  for  listener  PD. 
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reveals  that  Quality,  as  well  as  Tempo,  is  becoming  increasingly 
important.  It  is  also  interesting  to  note  that  this  listener 
achieved  a nearly  optimal  division  of  attention  between  the  two 
features.  Beyond  the  second  trial  block,  her  estimated  Tempo 
weight  is  nearly  identical  to  that  required  for  optimal 
classification  performance  (a  Tempo  to  Quality  ratio  of  60:40). 
The  stability  of  the  attentional  effort  and  Tempo  weight  curves 
suggests  that  this  listener  was  doing  about  as  well  as  she  could. 
This  raises  the  possibility  that  sensory  limitations  contributed 
to  the  peformance  ceiling  observed  in  this  case. 

The  second  listener  in  the  Tempo  group,  listener  EK, 
achieved  little  improvement  after  the  fifth  block.  In  contrast 
to  SG,  EK  concentrated  almost  exclusively  on  the  Tempo  dimension. 
This  is  clearly  seen  in  the  overlapping  attentional  effort  and 
Tempo  weight  curves  in  Figure  3.  It  is  important  to  note  that  an 
optimal  allocation  of  attention  in  this  case  would  result  in  a 
60:40  split  between  the  Tempo  and  Quality  dimensions.  Since  EK 
generally  allocated  between  80?  and  100?  of  her  attention  to  the 
Tempo  dimension,  her  40?  correct  performance  level  fell  well 
below  that  observed  for  listener  SG.  Of  particular  interest  in 
this  case,  however,  are  several  instances  when  EK  increased  her 
overall  attentional  effort  by  increasing  her  Quality  weight. 
This  occurred  on  blocks  5,  7,  8,  10,  14  and  16.  On  these 
occasions  her  performance  improved,  and  her  division  of  attention 
was  more  nearly  optimal.  This  suggests  that  for  EK,  the 
performance  ceiling  reflected  by  the  attentional  effort  parameter 
is  not  attributable  to  sensory  limitations.  It  appears,  rather, 
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that  this  listener  was  able  to  distinguish  important  differences 
in  signal  quality,  but  did  not  do  so  in  any  consistent  fashion. 
This  interpretation  is  consistent  with  her  post-experimental 
interview  in  which  she  expressed  little  awareness  of  signal 
Quality  as  an  independent  cue. 

Although  listener  KR  was  the  better  of  the  two  in  the 
Quality  group,  she  did  not  begin  to  improve  significantly  until 
the  fifth  block.  The  amount  of  overlap  between  the  overall 
attentional  effort  and  Tempo  weight  curves  over  the  first  three 
days  for  this  listener  (Figure  4)  indicates  that  she  initially 
concentrated  almost  exclusively  on  Tempo,  the  less  important 
feature.  At  block  4,  however,  KR  "discovered"  the  Quality 
dimension  as  is  evidenced  by  the  diverging  overall  effort  and 
Tempo  weight  curves.  From  blocks  5 to  16,  her  attention  was 
allocated  primarily  to  the  more  important  Quality  dimension.  The 
consistently  higher  peformance  during  this  interval  is  obvious  in 
Figure  1.  Despite  this,  however,  KR  focused  more  attention  on 
Tempo  than  would  have  been  optimal  (see  Figure  4).  This 
observation,  together  with  the  relatively  stable  performance 
after  block  5,  suggests  that  KR  could  not  do  any  better  with  the 
Quality  dimension.  As  in  the  case  of  listener  SG,  this  implies 
that  sensory  factors  are  playing  an  important  role  in  limiting 
her  performance. 

Listener  PD  showed  little  improvement  during  the  experiment. 
Most  of  his  effort  was  allocated  to  Tempo,  the  less  relevant 
dimension.  In  fact,  his  25%  correct  peformance  level  would  be 
expected  if  he  were  accurately  discriminating  the  two  levels  of 
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Tempo  and  responding  randomly  with  regard  to  Quality.  The  slight 
fluctuations  in  performance  that  are  observed  in  Figure  1 are 
attributable  to  small  increments  in  the  Quality  weight  since  the 
Tempo  weight  remained  constant  over  the  16  blocks.  These 
observations  were  confirmed  by  PD ' s self-reported  inability  to 
distinguish  the  Quality  differences.  Since  PD  was  familiar  with 
the  two-dimensional  structure  of  the  stimuli  before  beginning  the 
experiment,  his  data  clearly  indicate  a sensory  limitation. 

Summary  and  conclusions . The  present  findings  clearly 
indicate  that  extended  practice  alone  is  not  likely  to  lead  to 
any  significant  improvements  in  a listener’s  ability  to  use  less 
important,  difficult  signal  features  in  aural  classification. 
Although  one  listener  ( K R ) only  "discovered”  the  Quality 
dimension  on  block  4,  very  few  changes  occurred  in  the  overall 
attentional  effort  parameter  for  any  listener  after  block  5.  The 
present  findings  also  emphasize,  however,  that  the  "attentional 
effort"  parameter  reflects  sensory  as  well  as  cognitive  factors. 
Two  of  our  listeners,  EK  and  PD,  found  it  extremely  difficult  to 
discriminate  differences  in  signal  Quality.  On  the  other  hand, 
listeners  SG  and  KR  could  make  use  of  this  feature,  albeit  to  a 
limited  degree.  There  was  no  doubt,  however,  that  Tempo  was  the 
easier  of  the  two  dimensions  for  all  four  listeners.  Further 
research  with  Howard  et  al's  (1978)  aural  classification  model 
should  address  the  issue  of  how  sensory  and  cognitive  factors 
contribute  to  the  attentional  effort  parameter.  Similar  work 
along  these  lines  by  Gravetter  and  Lockhead  (1973)  suggests  that 
such  an  approach  may  prove  fruitful. 
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